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Abstract 

Here we introduce a new design framework for synthetic biology that exploits the advantages of 
Bayesian model selection. We will argue that the difference between inference and design is that in 
the former we try to reconstruct the system that has given rise to the data that we observe, while 
in the latter, we seek to construct the system that produces the data that we would like to observe, 
i.e. the desired behavior. Our approach allows us to exploit methods from Bayesian statistics, 
including efficient exploration of models spaces and high-dimensional parameter spaces, and the 
ability to rank models with respect to their ability to generate certain types of data. Bayesian 
model selection furthermore automatically strikes a balance between complexity and (predictive 
or explanatory) performance of mathematical models. In order to deal with the complexities of 
molecular systems we employ an approximate Bayesian computation scheme which only requires 
us to simulate from different competing models in order to arrive at rational criteria for choosing 
between them. We illustrate the advantages resulting from combining the design and modeling 
(or in-silico prototyping) stages currently seen as separate in synthetic biology by reference to 
deterministic and stochastic model systems exhibiting adaptive and switch-like behavior, as well as 
bacterial two-component signaling systems. 

1 Introduction 

As we are beginning to understand the mechanisms governing biological systems we are starting to 
identify potential ways of guiding or controlling the behavior of cellular and molecular systems. Ra- 
tionally reengineering organisms for biomedical or biotechnological purposes has become the central 
aim of the fledgling discipline of synthetic biology. By redirecting regulatory and physical interactions 
or by altering molecular binding affinities we may, for example, control metabolic processes [U [2] 
or alter intra and inter cellular communication and decision making processes [31 0]. The range of 
potential applications of such engineered systems is vast: designing microbes for biofuel production 
[5j [6] and bioremediation [7] ; developing control strategies which drive stem cells through the various 
decisions to become terminally differentiated (or back) [8j [9] , with the aim of developing novel thera- 
peutics [TT] ; construction of new drug-delivery systems with homing microbes delivering molecular 
medicines directly to the site where they are needed [12]; use of bacteria or bacterial populations 
(employing swarming and quorum sensing) as biosensors |13| ; and gaining better understanding of all 
manner of biological systems by systematically probing their underlying molecular machinery. 

A range of tools and building blocks for such engineered biological systems are now available which 
allow us to, at least in principle, build such systems from simple and reusable biological components 
|14j . In electronic systems, such modularity has been crucial and has allowed the cost-effective produc- 
tion of reliable components that can be combined to produce desired outputs. Biology, however, poses 



different and novel challenges that are intimately linked to the biophysical and biochemical properties 
of biomolecules and the media in which they are suspended. Especially in crowded environments such 
as found inside living cells the lack of insulation between different components, i.e. the very real 
possibility of undesired cross talk, can create problems; with increasing miniaturization similar, albeit 
quantum effects, are now also surfacing in electronic circuits [15J. 
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Figure 1: Bayesian approach to system design. A) The design objectives are encoded by the speci- 
fication of input and output characteristics. B) One or more competing designs for the system are 
specified together with priors on the parameters. A distance function, A(x, O), relates model output 
to the desired output characteristic. C) The system is evolved using sequential Monte Carlo. Each 
population more accurately approximates the desired behavior. D) The model posterior probability 
encodes the ability of each design to achieve the desired behavior. The parameter posterior shows 
parameters that are sensitive or insensitive to the input-output specification. 



As synthetic biology gears up to bring engineering methods and tools to bear on biological problems 
the way in which we manipulate biological systems and processes is likely to change. Historically, each 
new branch of engineering has gone through a phase of what can be described as tinkering before 
rationally planned and executed designs became common place. Arguably, this is the current state 
of synthetic biology and it has indeed been suggested that the complexity of synthetic biological 
systems over the past decade has reached a plateau [H] . From the earliest days, explicit quantitative 
modeling of systems has been integral to the vision and practice of synthetic biology and it will 
become increasingly important in the future. The ability to model how a natural or synthetic system 
will perform under controlled conditions must in fact be seen as one of the hallmarks of success of the 
integrative approaches taken by systems and synthetic biology. 

Here we present a statistical approach to the design of synthetic biological systems that utilizes 
methods from Bayesian statistics to train models according to specified input-output characteristics. 
It incorporates modeling and automated design and is general in the sense that it can be applied 
to any system that can be described by a mathematical model which can be simulated. Because of 
the statistical nature of this approach, previously challenging problems such as handling stochastic 
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models, accounting for kinetic parameter uncertainty and incorporating environmental stochasticity 
can all be handled in a straightforward and consistent manner. 

2 Bayesian approach to system design 

The question of how to design a system to perform a specified task can be viewed as an analogue to 
reverse engineering. In design we want to elucidate the most appropriate system to achieve our design 
objectives; in reverse engineering we aim to infer the most probable system structure and dynamics 
that can give rise to some observational data. In this respect, the design question can be viewed as 
statistical inference on data we wish to observe. 

In the Bayesian approach to statistical inference the posterior distribution is the quantity of in- 
terest and this is given by the normalized product of the likelihood and the prior. In most practical 
applications the posterior distribution cannot be derived analytically but if the likelihood (and prior) 
can be expressed mathematically we can use Monte Carlo methods to sample from the posterior. In 
many cases where the model structure is complex the likelihood cannot be written in closed form 
and traditional Monte Carlo techniques cannot be applied. These include inference for the types of 
stochastic processes encountered in systems and synthetic biology. In these cases a family of tech- 
niques known collectively as approximate Bayesian computation (ABC) can be applied: these use 
model simulations to approximate the posterior distribution directly. Here we use a sequential Monte 
Carlo ABC algorithm known as ABC SMC to move from the prior to the approximate posterior via a 
series of intermediate distributions |17j . This framework can also be used to perform Bayesian model 
selection [18J and has been implemented in the software package ABC-SysBio [19J. 

Figure [T] depicts the approach presented here. The design objectives are first specified through 
input-output characteristics. Here these have been depicted as a single time series, though the method 
can be applied in a much broader sense with multiple inputs and outputs. A set of competing designs 
is then specified through deterministic or stochastic models each containing a set of kinetic parameters 
and associated prior distributions. The distance function measures the discrepancy between the model 
output and the objective. In principal it is possible to specify a distribution over the objective and each 
model could also contain experimental error. The ABC SMC algorithm then automatically evolves 
the set of models towards the desired design objectives. The results are a set of posterior probabilities 
representing the probability for each design to achieve the specified design objectives in addition to 
the posterior probability distribution of the associated kinetic parameters. This approach is similar in 
spirit to some existing methods for the automated design of genetic networks such as those adopting 
evolutionary algorithms |20|, [2T] , Monte Carlo methods [22L [23] or optimization |24l [25l [26] but the 
advantages of our method over traditional ones are that we can utilize powerful concepts from Bayesian 
statistics in the design of complex biological systems, including 

• the rational comparison of models under parameter uncertainty using Bayesian model selection 
which automatically accounts for model complexity (number of parameters) and robustness to 
parameter uncertainty 

• a posterior distribution over possible design parameter values that can be analyzed for parameter 
sensitivity and robustness and provide credible limits on design parameters 

• the treatment of stochastic systems at the design stage including the design of systems with 
required probability distributions on system components. 

• methods for the efficient exploration of high dimensional parameter space 

In the following we demonstrate the power of this approach by examining, from this new perspec- 
tive, systems that have been of interest in the recent literature. First we consider systems that are 
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Figure 2: Biochemical adaptation. A) 11 networks capable of biochemical adaptation. A,B,C rep- 
resent enzymes that catalyze reactions in their active state. For example A — > B indicates that A 
converts B from its inactive to active state and A H B indicates that A converts B from its active 
to inactive state. The input is applied to species A and the output is taken to be the concentration 
of the active form of C. The concentrations of active and inactive forms sum to 1. Reactions with 
no origin node refer to background activating/deactivations enzymes. B) Posterior probability for 
achieving biochemical adaptation when there is no cooperativity. C) Posterior probability for achiev- 
ing biochemical adaptation when cooperativity is included. The error bars in panels B and C indicate 
the variability in the marginal model posteriors over three separate runs. D) Parameter posterior 
distribution, represented by univariate and bivariate marginal distributions, for model 11 in the case 
of no cooperativity. 



capable of biochemical adaptation [27]. We then look at the ability of two bacterial two component 
system (TCS) topologies to achieve particular input-output behaviors; and finally we finish with an 
analysis of designs for a stochastic toggle switch with no cooperative binding at the promoter. 



3 Biochemical adaptation 

Biochemical adaptation refers to the ability of a system to respond to an input signal and return to the 
pre-stimulus steady state (Figure [l]A.). Ma et al [27] identified two three- node network topologies that 
are necessary for biochemical adaptation: a negative feedback loop with a buffering node (NFBLB) 
and an incoherent feedforward loop with a proportioner node (IFFLP). Within these categories they 
identified eleven simple networks that were capable of adaptation (Figure[2]A). We applied the Bayesian 
design approach to these eleven networks using Michaelis-Menten kinetic models with and without 
cooperativity (see appendix B for the ordinary differential equations (ODEs) describing these models). 
The desired output characteristics were defined through the adaptation efficiency, E, and sensitivity, 
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where Ji, I 2 are the input values (here fixed at 0.5 and 0.6 respectively), Oi, 2 are the output steady 
state levels before and after the input change and O pea k is the maximal transient output level. We 
defined the two component distance to be e = such that as e decreases the behavior ap- 

proaches the desired behavior. The final population was defined to obey the toletances e = {0.1, 1.0}, 
which defines close to perfect adaptation (when 0± — O2 < Oi/50) and a fractional response equal to 
the fractional change in input. 

The results of the model selection are shown in Figure [2] (B and C). When cooperativity is not 
included the most robust designs for producing the desired input-output characteristics are the inco- 
herent feedforward loops, but when cooperativity is added the posterior shifts significantly towards 
the negative feedback topology. If a system with these requirements were to be implemented then 
not only would designs 11 and 4 be clear candidates for further study, but many of the designs can 
be effectively ruled out and the ranking of the models provides a clear strategy for an experimental 
programme. These results also illustrate how small changes in context or incomplete understanding 
of a system can produce a large change in the most robust design. The Bayesian framework allows 
us to incorporate such uncertainty — or safeguard against our ignorance — naturally into the design 
process. 

The posterior distribution provides information on which parameters are correlated and which are 
the most sensitive to the desired behavior. The posterior for model 11 under no cooperativity is shown 
in Figure [2p, where the ODE model is given by 

= IklA {l-A) + K IA ~ FAkFA AVK^ A 
dB (1--B) „ , B 

^ = AkAB (l-B) + K AB ~ B~+1Kfb 

~*t = AkAC {i-c) + K AC ~ BkBC cTK^<> 

where X = {A,B,C} corresponds to the concentrations of the active forms of proteins (1 — X cor- 
responds to the concentrations of the inactive form). / represents the input signal and the k and 
K represent the reaction rate parameters (of which there are 12 in total). F A and Fb represent 
background deactivating enzymes with concentration fixed to 0.5. 

The posterior shows in particular that the parameters for the background deactivating enzyme {kFB 
and Kfb) on node B are correlated, and that Kfb should be large; this is exactly the requirement for 
the linear regime necessary for the IFFLP system to achieve adaptation [27J. A principal component 
analysis of the posterior (Figure SI) shows other correlated parameters on the first few principal 
components. The last principal component describes the direction of least variance and therefore the 
most sensitive parameters. From this we can deduce for example that the reaction between nodes A 
and C is relatively unimportant. A similar analysis for model 4 in the case of cooperativity (Figures 
S2 and S3) shows for example that the behavior is insensitive to the values of the Hill coefficients and 
the details of the reaction between nodes A and C. 

4 Robust oscillator design 

Biochemical oscillations are increasingly being implemented in various synthetic systems |28t [29l [30~1 
151] . A recent study by Tsai et al [32] compared the ability of five small networks to achieve oscillations. 



5 



The five designs are shown in Figure [3]A. where each node represents the active form of a protein, edges 
represent enzymatic reactions and thicker edges represent increased feedback strength. We applied 
our Bayesian design methodology to the original problem and further investigated the ability of these 
designs, provided in detail in appendix C, to achieve particular amplitude-frequency values. 





Figure 3: Robust oscillator models. A) 5 oscillator models. Model 1 is a loop of repressive enzymatic 
reactions. Models 2 and 3 have an additional positive feedback loop on node A with the feedback 
strength stronger in model 3 (represented by the thicker loop). Models 4 and 5 have an additional 
negative feedback loop on node A with the feedback strength stronger in model 5. B) Posterior 
probability for achieving Hopf bifurcation type limit cycle oscillations. C) Posterior probabilities for 
species A achieving oscillations with amplitude of 0.1 and a frequency of 1Hz. D) Posterior probabilities 
for species C achieving oscillations with amplitude of 0.1 and a frequency of 1Hz. The error bars in 
panels B, C and D indicate the variability in the marginal model posteriors over three separate runs. 



Figure [3)3 shows the posterior probability for each model to achieve limit cycle behavior induced 
by a Hopf bifurcation. The addition of the negative feedback loop in models 4 and 5 does not improve 
the ability to achieve oscillations. We find that the addition of a positive feedback loop on species A 
in models 2 and 3 increases the ability of the system to achieve limit cycle behavior, but no significant 
increase in the posterior probability is provided by increasing the feedback strength. This is in conflict 
with the original study that found that model 3 outperformed model 2 [32]. Our approach does 
sample parameter space predominantly in regions where the desired behavior is more likely, rather 
than entirely at random as was done in the previous study; on balance this suggests that the posterior 
probability for delivering robust oscillations is approximately the same for models 2 and 3. 

More insight can be gained into this discrepancy by specifying a particular frequency and amplitude 
of the oscillator as the desired output behavior. Figures[3p and D show the model posterior probability 
after requiring an amplitude of 0.1 and a frequency of 1.0 Hz on species A and C respectively. The 
first thing to note is that the model posterior is significantly different in the two cases. When the 
constraints are applied to species A, model 3 is favored with the increase in feedback strength decreasing 
the ability to reach the specified behavior. When the conditions are applied to species C (and species B 
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by symmetry) we get a posterior that more resembles the original findings; that the increase in feedback 
strength does indeed increase the ability to reach the specified oscillations. Thus the posterior for the 
Hopf bifurcation behavior represents a sum over all possible oscillator characteristics; in a manner 
that is reminiscent of Bayesian model averaging. 

If we examine the posterior distribution and the principal component analysis for model 2 to 
achieve Hopf bifurcations (Figures S4 and S5), we can see that the parameters k\ and k$, which 
are the strengths of the deactivating reactions on nodes A and B, are constrained to be similar in 
magnitude to k§. We also see that within this model the feedback strength, kj, does not affect the 
dynamics significantly. Here, and elsewhere, we can use the posterior distributions in order to gain 
insights into the sensitivity and robustness of the system to variations in parameters, irrespective of 
whether the system's dynamics are deterministic or stochastic: our ABC SMC framework allows us 
to extract such information on the fly as part of the sequential design process. 

5 Bacterial two component systems 

Two component systems (TCS) allow bacteria to sense external environmental stimuli and relay in- 
formation into the cell, e.g. to the gene expression apparatus. They consist of a histidine kinase 
(HK) that autophosphorylates upon interaction with a specific stimulus. The phosphate group is then 
passed on to a cognate response regulator protein (RR) which, once phosphorylated, can regulate 
transcription [33J. 

Naturally occurring TCS differ in the number of phosphate binding domains. In the most common 
form there are two phosphate binding domains (Figure |4]A.) but an alternative form exists that consists 
of a phosphorelay mechanism with four phosphate binding domains (Figure[4|3). These shall be referred 
to as the orthodox and unorthodox TCSs, respectively [33] • The reason for the existence of two forms 
of TCS remains largely unknown but it has been demonstrated that the phosphorelay is robust to 
noise and can provide an ultrasensitive response to stimuli |34| I35j . whereas the orthodox system 
can provide behavior that is independent of the concentrations of its components [36] . Here we have 
applied the Bayesian approach to directly compare the ability of orthodox and unorthodox designs to 
achieve various input-output behaviors, using ODE models similar to ones described previously [34J 
(see appendix D). 

Figures |4p-F show four types of behaviors that may be desired in synthetic TCS systems (e.g. for 
bioremediation or biopharmaceutical applications), and the corresponding posterior probabilities of 
the orthodox and unorthodox models to achieve them. In Figure |4p the specified behavior is that of 
a fast response to a square pulse input signal. That is the output should show a maximum within 0.1 
seconds after the pulse starts and a minimum 0.1 seconds after the pulse ends. As can be seen from the 
posterior probabilities, both models achieve this behavior easily, as one would expect from a signaling 
system, with the orthodox system slightly outperforming the unorthodox system. In Figure |4p the 
ability of the two systems to achieve a steady output state for t > 2 seconds under a constant input 
signal is examined, and again both systems perform comparably with the orthodox system appearing 
slightly more favorable. 

In Figure [4)5 the input signal is a high frequency sinusoid with a mean of 0.6, and the desired 
output is a constant signal with the same mean and root mean square < 0.3; this would mimic a 
system that is robust to high-frequency noise. The output trajectories at some intermediate and final 
populations are shown in Figure S6. In this example the unorthodox system clearly outperforms 
the orthodox system, which indicates the increased robust to noisy signals that comes with the relay 
architecture [33]. But the direct comparison of the two models' ability to cope with noise, which is 
becoming possible in this approach, also reveals some unexpected characteristics: inspection of the 
posterior distribution (Figure S7) shows that all the dephosphorylation reaction rates, kQ,ks,kg, are 
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Figure 4: Bacterial two component systems. A) Orthodox system where HK denotes the histidine 
kinase and RR the response regulator, both of which have phosphorylated forms, HKp and RRp. 
Arrows represent reactions involving phosphate groups and the ki represent the rate parameters. S 
is the input stimulus signal that causes autophosphorylation of the histidine kinase. B) Phosphorelay 
system with three phosphate-binding domains, where H, D refer to histidine and aspartate domains 
respectively. C-F) Specified input-output behavior (above) and posterior probabilities for the two 
designs to achieve it (below). The input signal corresponds to the stimulus, S, and the output signal 
is represented by the concentration of phosphorylated response regulator, RRp. The error bars indicate 
the variability in the marginal model posteriors over three separate runs. 



minimized while the rate of the signal induced autophosphorylation {k\) is large. Thus the noise 
reduction mechanism in the unorthodox system works by saturating the system. 

In Figure [4^ the input is again a step function but the output is more specific; it much reach to 
> 0.8 and drop to < 0.2 within 0.5 seconds of the pulse start and end, respectively, thus approximately 
reproducing the input. Figure S8 shows the evolution of the system in this case. Here the orthodox 
model clearly outperforms the unorthodox model. Inspection of the posterior distribution (Figure S9) 
shows that both the rate of the signal induced autophosphorylation and the rate of phosphorylation of 
the response regulator by the histidine kinase, fei, &2, are large while the rate of dephosphorylation of 
the response regulator, k$, is small. This ensures that the shape of the signal is transferred faithfully 
through the system. 

6 Stochastic genetic toggle switch without cooperativity 

The genetic toggle switch is a synthetic realization of a bistable switch that forms the basis of cellular 
memory [37J. It is formed by two genes A and B, whose respective proteins repress the production 
of the other protein; protein A represses the production of protein B and vice versa (Figure [5]). The 
presence of an interaction with inducer molecules allows the system to switch between steady states 
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with the probability of spontaneous switching low enough such that, in the absence of an interaction, 
the system will effectively remain in its appropriate steady state indefinitely. 

Here we consider four versions of the stochastic genetic toggle switch that are all bistable without 
the requirement for cooperative binding of the proteins to the gene promoter [38j. Note that the 
deterministic models are not necessarily bistable; these are shown in Figure [5|^ and consist of the 
basic toggle switch, an exclusive version containing only one promoter, a version that includes bound 
repressor degradation (BRD), and a version containing a protein-protein interaction between A and 
B with the resulting complex nonfunctional. The additional reactions are are always to reduce the 
probability of the 'deadlock' state where both A and B are bound to the promotors of B and A 
respectively [38J. We modeled the switches using a continuous time Markov jump process which 
obeys the chemical master equation. Only protein level reactions were modeled which makes the 
models simpler (and faster to simulate) while retaining the important behavior. The stochastic models 
for all four switches are given in appendix E. For such complicated stochastic dynamical systems 
the advantages of a Bayesian perspective over conventional model design strategies (based e.g. on 
optimization) come to the fore: without an appreciation of the whole distribution choice of the best 
model would be subject to considerable uncertainty. 




time model 



Figure 5: Stochastic toggle switch. A) Four different designs for a toggle switch without cooperative 
binding. Going clockwise from top left we have the basic switch, the exclusive switch where there is 
only one repressor site, the basic switch with bound repressor degradation (BRD), and the basic switch 
with a protein-protein interaction. Genes gA,gB express proteins A,B and S, R represent inducer 
molecules. The species AB represents complex formation. The rate constants for the reactions are 
shown for the exclusive switch (protein degradation and transcription factor dissociation from the 
promoter are not shown). B) The specified input-output behavior. C) The posterior probabilities for 
each model to achieve the toggle switch behavior. D) Parameter posterior distribution, represented 
by univariate and bivariate marginal distributions, for model 2 (exclusive switch). 
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Figure [5p shows the desired toggle switch behavior. The inducer S is added between t = 10 and 
t = 30, after which the level of protein B should reach a steady state with a mean number of 40. 
Between t = 60 and t = 80 the inducer R is added after which the level of protein B should drop 
to zero. The inducer numbers are both assumed to be 40 molecules which is fixed in the specified 
range. The desired output behavior was specified via the two component distance metric, defined to 
be e = {di,d 2 }, 



do 



n, 



rip 



where xt is the number of protein B at time t, y is the target (here fixed at 40), a = {t : 30 < t < 60}, 
(3 = {t : < t < 10 and 80 < t < 100}, n a = #q and np = #f3. The final population was defined to 
be e = {7.0,0.05}. Here d\ represents the distance in the "on" region and efo represents the distance 
in the "off" region. 

Figure S10 shows the evolution of the stochastic simulations towards the desired behavior. Figure 
[5p shows the posterior probabilities for each system to achieve the specified behavior, and in particular 
demonstrates that the exclusive toggle switch outperforms all the others. This chimes with intuition, 
since the exclusive switch removes the possibility of the deadlock state without the addition of any 
extra reactions. The fact that the BRD switch performs worse than the original toggle shows that the 
addition of the two extra degradation reactions does not offer a great enough performance increase for 
the extra parameters, a manifestation of the parsimony principle or Occam's razor which is inherent 
to the Bayesian model selection framework used here. 

The reactions that comprise the exclusive switch are given by 
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where gA, gB represent the gene promotors for protein A, B respectively and are fixed at one copy, 
A — gB, B — gA represent the bound transcription factors and S, R represent the inducer molecules. 
The P species, fixed to be one copy, ensures only A or B can be bound at any one time. Examination 
of the posterior distribution for this model, Figure [5p, clearly shows a large correlation between &3 
and &4, which are the production and decay rates of protein B, respectively. This is clearly seen in 
the principal components (Figure Sll) through the combination &3 + k$ dominating the first PC and 
the combination — k% dominating the last PC. Thus the system is sensitive to only the difference 
in these rates which is typical in birth-death processes. 



7 Discussion 

In this paper we have presented a new method for the design of synthetic biological systems employing 
ideas from Bayesian statistics. We have demonstrated its utility and generality on three different 
systems spanning biochemical, signaling and genetic networks, as well as oscillatory systems. This 
method has advantages over traditional design approaches in that the modeling is incorporated directly 
into the design stage. The statistical nature of the method has many attractive features including the 
handling of stochastic systems, the ability to perform model selection and the handling of parameter 
uncertainty in a well defined manner. We used the ABC-SysBio software [19] which takes as input a 
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set of SBML files and as such can be used by bioengineers and experimentalists to rationally compare 
their competing designs for a system. By using this method we hope that the implementation time of 
synthetic systems can be reduced by defining a program of experimental work based on the posterior 
probabilities of each design. 

Monte Carlo sampling of parameter spaces has been used to assess the robustness of engineered 
and biological systems in the past. But like in the statistical case, simple Monte Carlo sampling 
tends to waste too much effort and time on those regions which are of no real interest for reverse- 
engineering or design purposes. Our statistically based sequential approach homes in onto those regions 
where the probability of observing the desired behavior is appreciable. This allows us a more nuanced 
comparative assessment of different design proposals, especially when dynamics are expected (or indeed 
desired) to exhibit elements of stochasticity. And the Bayesian model selection approach automatically 
strikes a balance between the systems' abilities to generate the desired behavior effectively but also 
robustly. 

Further developments will include the incorporation of methods for model abstraction to reduce 
computation time |39j and to handle a database of standard parts as in other existing design software 
systems [40, 41J. Moreover, it is also possible to include the generation of novel structures (by e.g. 
using stochastic-context free grammars [42] to propose alterations to a reaction/interaction network) 
as part of the design process. Just like in the case where ideas from control engineering and statistics 
can gainfully be combined in order to reverse-engineer the structure of naturally evolved biological 
systems, we feel that in the design of synthetic systems such a union will also be fruitful. 



A Approximate Bayesian Computation : ABC SMC 

Here we outline the background behind approximate Bayesian computation (ABC) and describe the 
ABC SMC algorithm [IT], which is implemented in the software package ABC-SysBio [19]. ABC 
methods have been developed to infer posterior distributions in cases where likelihood functions are 
computationally intractable or too costly to evaluate. They replace the calculation of the likelihood 
with a comparison between observed and simulated data. 



A.l Background 

Let 9 G be a parameter vector with prior n(9) and f(y\9) be the likelihood of the data y G T>. In 
Bayesian inference we are interested in the posterior density 

nm _ m«M<» 
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Now imagine the case where we cannot write down the likelihood in closed form but we can simulate 
from the data generating model. We can proceed by first sampling a parameter vector from the prior, 
9* ~ 7r(0), and then sampling a data vector, x* , from the model conditional on 9*, ie x* ~ f(x\9*). 
This alone gives the joint density ir(9,x). To obtain samples from the posterior distribution we must 
condition on the data y and this is done via an indicator function, i.e. 

7r(9)f(x\9)I Ay (x) 

n(9,x\y) - 



where Ib(z) denotes the indicator function and is equal to 1 for z G B. Here A y = {x G V : x = y}, so 
the indicator is equal to one when the simulated data and the observed data are identical. This forms 
a rejection algorithm, and in this instance the accepted 9* are from the true posterior density ir(9\y). 
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For most models it is impossible to achieve simulations with outputs in the subset A y and so an 
approximation must be made. This is the basis for ABC. In the first instance we can replace A y by 
Ay te = {x G T> : p(x,y) < e} where p : T> x T> — >• M + is a distance function comparing the simulated 
data to the observed data. We then have 



where n e is an approximation to the true posterior distribution. The rationale behind ABC is that if e 
is small then the resulting approximate posterior, 7r e , is close to the true posterior. Often, for complex 
models or stochastic systems, the subset A ye is still too restrictive. In these cases we can resort to 
comparisons of summary statistics. We now specify the subset A y ^,e = {x £ V : ps(x,y) < e} where 
rj : T> — > S is a summary statistic and the distance function now takes the form p$ ■ S x S — > M + . We 
often write the marginal posterior distribution as n(9\p(x* ,y) < e). 

A.2 ABC SMC 

The simplest ABC algorithm is known as the ABC rejection algorithm |43j and proceeds as follows 

Rl Sample 9* from ir(8). 

R2 Simulate a dataset x* from f(x\9*). 

R3 If p(x*,y) < e accept 9*, otherwise reject. 

R4 Return to Rl. 

This gives draws from n e but can be very inefficient in high dimensional models or when the overlap 
between the prior and posterior distributions is small. One way to improve the efficiency of the rejec- 
tion algorithm is to perform sequential importance sampling (SIS) [44j. In SIS, instead of sampling 
directly from the posterior distribution, sampling proceeds via a series of intermediate distributions. 
The importance distribution at each stage is constructed from a perturbed version of the previous 
population. This approach can be used in ABC and the resultant algorithm is known as ABC SMC 
[171 . Described here is a slightly modified version that automatically calculates the e schedule and as 
such, only the final value, ex, needs be specified. To obtain N samples {9 l , 9 2 , # 3 ...., 6 N } (known as 
particles) from the posterior, defined as, ir(9\p(x* ,y) < €t), proceed as follows 

SI Initialize e = oo 

Set the population indicator t = 

52.0 Set the particle indicator i = 1 

52.1 If t = 0, sample 9** independently from tt(9) 

If t > 0, sample 9* from the previous population {9\_{\ with weights Wt-i- 

Perturb the particle, 9** ~ Kt(9\9*) where Kt is the perturbation kernel. 

If tt(0**) = 0, return to S2.1 

Simulate a candidate dataset x* ~ f(x\9**). 

If p(x*,y) > e return to S2.1 

52.2 Set 9\ = 9** and d\ = p(x*,y), calculate the weight as 



Tr e (9,x\y) 



7T(9)f(x\9)I Ay ^x) 




If i < N, set i = i + 1, go to S2.1 



S3 Normalize the weights. 

Determine e such that Pr(dt < e) = 0.9. 
If e > e T , set t = t + 1, go to S2.0. 
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Here Kt(8\6*) is the component- wise random walk perturbation kernel that, in this study, takes the 
form Kt(0*\6) = 9 + U(—6,S) where 5 = ^range{#t_i}. The denominator in the weight calculation 
can be seen as the probability of observing the current particle given the previous population. 

A. 3 Model selection 

In Bayesian inference comparison of a discrete set of models can be be performed using the marginal 
posterior. Consider the joint space defined by (M, 9) G Ai x Qj^ ; Bayes theorem can then be written 

(M] _ f(y\M)ir(M) _ f(y\M)ir(M) 
7F( lV) ~ f M f(y\M')ir(M')dM' ~ T^JMM^WY 

where f(y\M), the marginal likelihood, can be written 

f(y\M) = f TT(8\M)f(y\8,M)d8. 

Therefore the posterior probability of a model is given by the normalized marginal likelihood which 
may or may not be weighted depending on whether the prior over models is informative or uniform 
respectively. It has recently been noted that model selection using summary statistics can be prob- 
lematic because the summary statistic must be sufficient for the joint space, {M, 9}, rather than just 
9 jl5]. This is not a concern here since in all our examples we use the full data set with no summary 
or we define our posterior distributions through the summary statistics. 

Model selection can be incorporated into the ABC framework by introducing the model indicator 
M and proceeding with inference on the joint space. For example, the ABC rejection algorithm with 
model selection [36] proceeds as follows 



MR1 Sample M* from tt(M). 

MR2 Sample 9* from tt{9\M*). 

MR3 Simulate a dataset x* from f(x\9*,M*). 

MR4 If p(x*,y) < e accept (M*, 9*), otherwise reject. 

MR5 Return to Rl. 

Once iV samples have been accepted an approximation to the marginal posterior, ir(M = m\y), is 
given by 

#accepted m 
n(M = m\y) = — 



N 

Model selection can also be incorporated into the ABC SMC algorithm |18| . To obtain N samples 
{(M, 6>)\ (M, 9) 2 , (M, 6>) 3 ...., (M, 9) N } from the posterior, defined as, ir(M,9\p(x* ,y) < e T ), proceed 
as follows 
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MSI Initialize e = oo 

Set the population indicator t = 
MS2.0 Set the particle indicator i = 1 

MS2.1 If t = 0, sample (M**, 6***) from the prior vr(M, 0) = 7r(M)7r(0|M). 

If t > 0, sample M* with probability P t -i{M*) and perturb M** ~ KM t (M|M*). 
Sample 0* from the previous population {0(M**)t_i} with weights u;t_i. 
Perturb the particle, 9** ~ ^t,M**(^|^*) where Kt^M is the perturbation kernel. 
If tt(M**,6**) = 0, return to MS2.1 
Simulate a candidate dataset x* ~ /(x|M**, (9**). 
If p(x*,y) > e return to MS2.1 
MS2.2 Set (M, 0)\ = (M**,8**) and df = p(x*,y), calculate the weight as 

1 if t = 
if t > 



where 

Si = EjeM Pt-mU)KM t {Ml\Mi_ x ) 
and 

^2 - Lfc 6 M'=M t _i P t _!(M t l =M t _i) 

If i < JV, set i = i + 1, go to MS2.1 
S3 Normalize the weights. 

Obtain the marginal model probabilities given by 

P t (M t = m) = E fc6 Mi=M 4 _ a wi(Mi,ei) 

Determine e such that Pr(dt < e) = 0.9. 
If e > e T , set t = t + 1, go to MS2.0. 



There are two obvious additions to the algorithm when compared to parameter inference. The model 
kernel, KMt, perturbs the resampled models using a multinomial distribution, and the additional 
term in the weight denominator accounts for the probability of observing the current model given the 
previous population. 

A. 4 Prior distribution 

The prior distribution encodes our knowledge of the system and should be set according to known 
biochemical properties. However, often the kinetic parameters are not well known and can be very 
difficult or even impossible to measure in vivo. In these cases we make the prior distribution non 
informative by specifying a large range over possible, biophysically and biochemically plausible values. 
As more information becomes available, through experimental studies or otherwise, the prior can be 
updated to reflect our increased knowledge of the system. Interestingly, for some systems, our design 
method could help to constrain kinematic parameters where experimental data are unavailable. 

A. 5 The distance function and output tolerance 

In system design we would rarely insist on achieving the true posterior distribution corresponding to 
e = 0, but would like to reach the objective within some tolerance. A theorem due to Wilkinson (2008) 
|47j states that if we assume that the data can be considered as 

V = V0) + e > 
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where r](9) is a draw from the model at the 'best' input and e is an additive, independent error, then 
the approximate posterior distribution, ir(9\p(x* ,y) < e) can be interpreted as the 'true' posterior 
ir(9\y). While the independence assumption is not always true, this theorem provides some insight 
into the relationship between the final e value and the tolerance on our specified behavior. For example 
when using uniform kernels, as in this study, if our desired output behavior is a constant of 0.5 and we 
finish the inference at e = 0.05 our final trajectories will be distributed C/(0.45, 0.55) giving a tolerance 
of ±10% on the output behavior. This can be used when considering our desired output objectives. 
To achieve other error distributions, such as Gaussian errors, we can always explicitly specify the error 
model in the design objectives. 



A. 6 Deterministic models 

Inference for deterministic models such as ordinary differential equations can be problematic since 
there is a one to one relationship between the parameter vector 9 and the data set x. Therefore, 
in the absence of observational error, the posterior distribution resembles a delta function, 5(9 — 9) 
where x = f(9) is data 'closest' to y. An additional problem for ABC methods is that the minimum 
distance, p(x,y), is greater than zero [39]. However, in practice, observational data have associated 
experimental errors and when this is included explicitly in the model, the problem is resolved. In the 
case of systems design, we omit the explicit error model for clarity, but note that it could be included 
with assumptions on the form of the distribution. 



B Biochemical adaptation 
B.l Models 

We used the same models as those used in |27j . which are enzymatic reactions assuming Michaelis- 
Menten kinetics. Below we give the full models including cooperativity but the more specific case of no 
cooperativity is when the exponents, n,, are set to one. Here A, B, C denote the concentrations of the 
active form of the species and (1 — A), (1 — B), (1 — C) the concentrations of the inactive form. Species 
Ei and Fi refer to background activating and deactivating enzymes respectively and are assumed to 
have a constant concentration of 0.5. The models were simulated in the range < t < 200. 
Design 1 

dA (1 - A) niA A npA 
-JZ = Ik lAr, A^ nr . , T^n JA ~ F A k F A 



dt ~ (1 — A) niA + tA A n FA + Kp F / 
dB (l - B) n cB B n ™ 
~dt ~ CkcB (l-BrcB +K ^ B B ~ B npB + 

dC (l-C) nA c C ns c 

~JZ = A kACTi T^TTZ , TsriAC - Bk BC- 



dt AC (1 - C) nA c + K n A A P BC C n BC + K n R B r c 



Design 2 



dA (l-A) niA , A npA 



dt 1A (l- A) niA + KjX a A npA + Kp F A A 

dB (1 - B) nEB B n cs 



dt ~ ^B^B (1 _ B)nEB + K n B » ^^* B n CB + g*CB 

dC A1 (l-C) nAC „, C nBC „ , C npc 

~ Ak ACT, T^TTZ , rsn A n ~ Bk B C „„ n „ , „nnr. ~ F C k FC 



dt AO (1 - CY A c + Kffi m "C n Bc + KH? L ' ^ C n ™ + K 



-n F c 
FC 
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Design 3 



Design 4 



Design 5 



Design 6 



Design 7 



dA (1 — A) niA A n FA 

~dt = IklA (i- A yiA + K \ l < A A ~ FaUfa A^a + K n / A 

dB (1 - B) riBB _ B n CB 

— - h B k EB ^_ B y EB +K nEB ~ Ck CB BncB +K ncB 

dC (l-C) nB c C nA c 

~dt ~ BC {1 - C) n BC + K n B B c c ~ AC 'C n Ac + K n £? 

dA (1 — A) UlA Ji^ba 

lit = IklA (1 - A)*" + K n IA A ~ BkBA A^ba + K n B B A A 
dB _ (1 - B) nAB B h pb 

~dt ~ AB {I - B^ab + K n A A B B ~ B np B -\- Kp B B 

dC _ {l-C) n AC C np O 

~dt ~ AC {I - C) n Ac + R n A AC ~ C FC C n FC + K n / C c 



dA (1 — A) niA ji^ba 

It = IklA {\- A) n iA + K^ A A ~ BkBA A^ba + K n B BA 
dB _ (1 — By i AB B n BB 

~dt ~ AB (1 - B) n AB + K n A AB ~ B Ufb + Kp B B 

dC _ (1 - C) nE c C n Ac 

— - E C k EC (1 _ c y EC + K nvc - Ak AC CnAC - k uac 

dA (1 — A) n ' A js^ba 

It = IklA (l- AyiA + K™ 1 / ~ BkBA A^ba + K n B BA 
dB _ (1 - B ) n cs B n " B 

~dt ~ CB (1 - Byes + K n c CB ~ B FB B n F B +K n / B B 
dC (l-C) nAC c n ^ 



(\ _ ^4) n /A j\n BA J^ri FA 

IklA (l- A )n IA + K niA - BkBA a^ba + K n B B A A ~ FAkFA A^A + K n / A 

(\ _ B^n EB B'riCB 
E B k EB (1 _ B y EB + k ueb - Ck CB BncB - R ncB 

(I _ C) n AC (jn FC 

AkAC {I - C) n Ac + R n A A ? ~ F ° kFC C n Fc + R n / C c 



dA 

lit 

dB ^ , (l-B) riBB ^, B n ^ B 

~dt 

dC _ 

— - Ak ACTi nMlAn ■ ^ nAC - t C k F c nnvn ; ^ nFC 

Design 8 



^ M _ J^ n IA ^"BA ^4"FA 

d £ _ r , (1 - ^) ncfl , £™ FS 

di" ~ CB (i _ B) n cB + Kq C b ~ B FB B^fb + K n / B B 
dC _ (l-C) nEC C nAC 

— - E CkEC ^ _ c)nEC + K n E( C - AkAC C^AC + R n A AC 



16 



Design 9 



Design 10 



Design 11 



dA 
~dt 
dB 
dt 
dC 
dt 



dA 
~dt 
dB 
~dt 
dC 
~dt 



dA 
~dt 
dB 
~dt 
dC 
~dt 



IklA Jl 
Eb^eb 

E c k E c 



Ik IA 



(1 - A) n ' A 
- A) niA + KjX a 

(1 - B) nEB 
(1 - B) n EB + K n E E B B 

(1 - C) nEC 
(1 _ c) n Ec + K n E E c c 



Bk B A 



A n BA + K n n B A A 

Ck CB 



BA 

~Q n CB 



Ak AC 



(jriAC 



niA 



(1 - A) n ^ 

(1 - B) nAB 



v-niA 



= Ak A B 

= BkBC '(1 - C) n BC + K n B B c c 



' {I _ B )uab + K n AB l 
(1 - C) nBC 



FAkFA A^a + K n // 

- F * k ™ B n FB + K7 B 
(jriAC 

~ Ak AC CnAC ~ r uac 



IkiA 



{I -A) 



niA 



(1 



Ak AB 



Ak AC 



(1 



A)niA + Rj A A 

(1 - B) nAB 
- B) nA B + K n A AB 
(1-C) 



F A k FA 



j\nFA 



A n FA + 



"AC 



(1 - C) nA c + Kl AC 



F B k FB 



Bk BC 



'■FA 



B riFB + K n F FB 



(jri B c 



C n BC + K B ff 



B.2 Distance 



The two component distance metric was denned to be e = {E, S }, where E and S are the adaptation 
efficiency and sensitivity defined by 



E 



S 



(o 2 - o 1 )/o 1 



(h - h)/h 

{Opeak ~ Ox) /Ox 



(I 2 ~ h)/h 



where Ix, h are the input values (here fixed at 0.5 and 0.6 respectively), Ox, O2 are the output steady 
state levels before and after the input change and and O pea k is the maximal transient output level. 
The final population was defined to be e = {0.1, 1.0}. 

B.3 Priors 

The priors on the Michaelis-Menten rates were chosen to correspond to the parameter ranges used in 
the original study; log A; ~ U(—l, 1) and log if ~ U(—3, 2) 
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C Robust oscillator design 



C.l Models 

We used the same models as those used in [32J , simulated in the range < t < 10. Again A, B, C denote 
the concentrations of the active form of the species and (1 — ^4), (1 — B), (1 — C) the concentrations 
of the inactive form. The feedback is modeled using Michaelis-Menten kinetics but the conversion of 
inactive form into active form is assumed to have a constant rate. 
Design 1 

dA , , „ N k 2 C ni 
— = kUl -A)- * - A 
dt y ' K^+C^ 



Designs 2 and 3 



Designs 4 and 5 



jg =M1 - B) _ fa- 4 -- B 

dt dV ; K% 2 +A n * 

dC '-wi o keBn3 C 



dA , A , k 2 C ni , , , <x A ni 

~ A )- r,n? , 7^ A + k 7 (l - A) 



dt v ' K^+C"! v 'Kjf + A"* 

— -Hl-B) - R n 2+An2 B 

dC - ui -o- hBn3 C 



dA 1 ,1 ^ fo ™ 1 A 1 A ^ 



dt iy ' K? 1 + C n i K2 4 +A n * 
dB -kh m hAH2 P. 

~a ~ h{1 ~ B) ~ WTa^ b 

dC -ui o keBns c 
— -k^i-c)- Bn c 



C.2 Distance 

For the direct Hopf bifurcation detection the distance metric was defined to be 

n i (l-0.99exp(-|Im[A i ]|)) 

where Aj is the i th complex eigenvalue of the linearized system in the steady state. Here e = 
represents the location in parameter space where a limit cycle emerges through a Hopf bifurcation 
|48j . The final population was at e = 0.001. 

To investigate the ability to achieve particular amplitude- frequency values, the distance was defined 
as e = {^1,(^2,^3}, where 

di = / J \xt +nT — X to+ ( n _i) T \ 

n 

d 2 = \ft~f\ 

d 3 = I max x t> t - min x t>to - A t | , 
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and n is an integer, ft is the target frequency, / is the frequency determined from the largest component 
of the Fourier spectrum, At is the target amplitude and to is a cut to remove initial transients (= 2s). 
The final population was defined to be e = {0.05, 0.05, 0.05}. 



C. 3 Priors 

The priors were chosen to correspond to parameter ranges used in the original study; k\ ~ £7(0, 10), 
k 2 ~ 17(0,1000), k 3 ~ 17(0,10), k 4 ~ 17(0,1000), k 6 ~ 17(0,1000), k 7 ~ [7(0, 100), fcf ro " 9 ~ 
[7(500,600), nj ~ [7(1,4) and 7Q ~ [7(0,4) [32]. 

D Bacterial two component systems 

D. l Models 

The models we used were based on the ones found in [31] . All simulations were performed in the range 

< t < 10. 

Orthodox system 

We modeled the following reactions 



HK + S 




HKp + S 


HKp + RR 




HK + RRp 


HKp 


% 


HK 


HK + RRp 




HKp + RR 


RRp 




RR. 



Additionally we assumed that the total concentration of HKtot = HK + HKp and RRtot = RR+ RRp 
were equal to one. This resulted in the following ordinary differential equations 

= k 2 [HKp][RR] + h[HKp] - k 4 [HK][RRp] - ki[HK][S] 

d[RRp] 
dt 

Orthodox system 

We labelled the occupied states of the phosphorelay as 

HI Dl H2 



k 2 [HKp}[RR] - k 4 [HK][RRp] - k 5 [RRp\. 



HKi 


X 


X 


X 


HK 2 


o 


X 


X 


HK 3 


X 





X 


HK 4 


X 


X 


o 


HK 5 


o 





X 


HKq 


o 


X 


o 


HK 7 


X 





o 


HK S 


o 





o 



where HI, Dl and H2 are the binding domains on the Histidine Kinase and x, o represent an empty, 
occupied domain respectively. We modeled the following reactions 

HK + S ^ HKp + S 
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Again we assumed that the total concentration of HK to t = Yl HKi and RRtot = RR + RRp were 
equal to one. This resulted in the following ordinary differential equations 



dHKi 

dt 
dHK 2 

dt 
dHK 3 

dt 
dHK 4 

dt 
dHK 5 

dt 
dHK 6 

dt 
dHK 7 

dt 
dRRp 

dt 

D.2 Distance 

The distance functions for input-output behaviors ei_4 were defined to be 

ei = |lH(0)(argmax t a; t - 2.0), H(0)(argmin t x t - 4.0)} 

* = ./E(^-i-O) 2 



k 4 [HK 4 ][RR] + k 6 [HK 3 ] - k 7 [HKi][RRp] + k 8 [HK 2 ] - kllHK^S] 
k 4 [HK 6 ][RR] + k 6 [HK 5 ] - k 7 [HK 2 ][RRp] - k 8 [HK 2 ] + k^HK^S] - k 2 [HK 2 ] 

-k 3 [HK 3 ] + k A [HK 7 ][RR] + k 5 [HK 4 ] - k 6 [HK 3 ] - k 7 [HK 3 }[RRp] + k s [HK 5 ] - h [H K 3 ] [S] + k 2 [HK 2 ] 
k 3 [HK 3 ] - k 4 [HK 4 ][RR] - k 5 [HK 4 ] + k 6 [HK 7 ] + k 7 [HKi][RRp] + k s [HK 6 ] - k^HK^S] 
-k 3 [HK 3 ] + k 4 [HK 8 ][RR] + k 5 [HK 6 ] - k 6 [HK 5 ] - k 7 [HK 5 ][RRp] - k 8 [HK 5 ] + hiHK^S] 
k 3 [HK 5 ] - k 2 [HK 6 ] - k 4 [HK 6 ][RR] - k 5 [HK 6 ] + k 6 [HK 8 ] + k 7 [HK 2 ][RRp] - k 8 [HK e ] + k^HK^S] 
k 2 [HK 6 ] - k 4 [HK 7 ][RR] - k 6 [HK 7 ] + k 7 [HK 3 ] [RRp] + k 8 [HK 8 ] - k^HK^S] 
k 4 [RR]([HK 4 ] + [HK 6 ] + [HK 7 ] + [HK 8 ]) - k 7 [RRp]([H K t ] + [HK 2 ] + [HK 3 ] + [HK 5 ]) - k 9 [RRp] 



e 4 = {ei,M(0)(l-maxxt-0.2),M(0)(minx t -0.2)} 
where H(0) is the Heaviside function ensuring the distance is positive. 

D. 3 Priors 

The priors on all variables were distributed as U(0, 1000). 

E Stochastic genetic toggle switch 

E. l Models 

We modeled each toggle switch using a continuous time Markov jump process which obeys the chemical 
master equation. We neglected processes at the RNA level and just modeled at the protein level. 
This makes the models simpler while retaining all the relevant behavior. In all the following gA, gB 
represent the gene promotor for protein A, B respectively and are fixed at one copy. A — gB, B — gA 
represent the bound transcription factors and S, R represent the switch and reset signals. Because the 
concentration of these are fixed they have the effect of removing A and B from the system respectively. 
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The models were simulated in the range < t < 200 
Design 1 





fci 


gA + A 


A 







gB 




gB + B 


r> 
U 




(A 


A + gB 




A-gB 


B + gA 


kfi^ 


B-gA 


A-gB 


k T> 


A + gB 


B-gA 




B + gA 


S + A 


fc9 


S-A 


R + B 


fcio 


> R- B 



Design 2 

Here, in addition to the species in design 1, we have introduced the P species, fixed to be one copy, 
which ensures only A or B can be bound at any one time 



gA 


fci 


gA + A 


A 


k 2 





gB 


k3^ 


gB + B 


B 


kj^ 





A + gB + P 


fcs^ 


A-gB 


B + gA + P 




B-gA 


A-gB 


k 7> 


A + gB + P 


B-gA 


fcs 


B + gA + P 


S + A 


kg^ 


S-A 


R + B 


kio 


vR-B. 



Design 3 

Here we have the same reactions in design 1 but include two extra reactions for the decay of the bound 
proteins 

A-gB^hgB 
B-gA^gA. 

Design 4 

Here we have the same reactions in design 1 but include two extra reactions for the binding / unbinding 
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of the proteins A and B 



A + B A- B 
A- B A + B. 

E.2 Distance 

The two component distance metric was defined to be e = {di, cfe}, 



EteaOs* - y) 2 

n a 

71(3 

where xt is the number of protein B at time t, y is the target (here fixed at 40), a = {t : 30 < t < 60}, 
(3 = {t : < t < 10 and 80 < t < 100}, n a = #a and ng = #/3. The final population was defined to 
be e = {7.0,0.05}. 

E.3 Priors 

The priors for production, binding and interaction rates were distributed as U (0, 50) and the priors 
for the degradation rates were given U(0, 5) distributions. 



di = 
d 2 = 
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Figure SI: Biochemical adaptation: principal component analysis of the posterior distribution for 
model 11 in the case of no cooperativity. 



26 



nFB 

II 1 1 1 1 1 
0.0 0.6 

nAC 

*0 ! "X OB O » «S> «*» » & O © o O O • 

lllllll 

0.0 0.6 

nIA 

& ! ~\ o> » «a> @» s ^ a 6 <&a <&» <o e> >a » 

1 1 1 1 1 n 

0.0 0.6 

nBA 

<#-> 0= I* co «a o <9 CW, <a» o fl :aB. O 

1 1 1 1 1 1 1 

0.0 0.6 

nFC 

^ mm fee es f^-v. «T~> o ,o o= <n ft» «=» 6* mm » o 

1 1 1 1 1 1 1 

0.0 0.6 

nAB 

Q Q Qg, Q^- Gp OS* ^ e Qt * G3 9 & 9 

i 1 1 1 1 1 1 

0.0 0.6 

k AB 

a a. 8, £ 8 (3 of SO <Ss> of i» O ( 

1 1 1 1 1 

-1.0 

kJA 

ag, « a ff» g= 9 e=» c=> 9 9' ^» • S9 f • <s> o 

li 1 1 1 

-1.0 

k AC 

=11 9 &p BB sD ! o 6 *® «*0 -Ob © 6 

1 1 1 1 1 

-1.0 

k_BA 

"5Z> O C7» CP" C3=> «S> *» i/^ (7 <3D> V O « O 

m 1 1 
-1.0 

k FB 

I I I I I 
-1.0 

k FC 

© o tti n & so* rb e,s o °s # q i 

i i i i i 

-1.0 

KAB 

- ^ ^ ^ 9 - g 3 ""g"S»hA®» • 8 "? 

iiiiii 

-3 1 

K_AC 

^ O St. «?• fe» © O © O G (33 £2^ Q O 45 B 

iiiiii 

-3 1 

- «=> cm c= U _ ^ _ ^ K_FB ^ 



-3 1 

K FC 

* ItttA 



3 1 



K IA 

,-.<o o QSP a CS> ^» • • ^3 ° O I ° 

llllll 

-3 1 

K BA 

-3 1 

Figure S2: Biochemical adaptation: posterior distribution for model 4 in the case when cooperativity 
is included. 



27 



I 






SOd 






OJU 






1 ad' 1 
vau 

OV" 




^^^^^H BVi i 


vi ■ 


| VI ^1 




Od >l 




Od >t 

□ ad"M 






| ad i 


1 


va n 

J va >i 




1 ov « 


1 


OV si 

av n 

1 av n 


D 













□d^ 
van 

^^^B av" 








1 vr* 

3d X 
□d * 

1 3d"> 


1 


3d « 




1 va > 


1 


va * 




1 ov > 

OV 1 

av~>i 







1 






























1 












1 


1 


ZD 









OV X 

ov n 
av_>i 
av >i 





■ vi 21 

Od_» 

DH 1 





















I 

ZD 




] 



























VI X 




VI 1 




Od >l 




Od 1 




ad M 




ad i 




I va * 




va i 

OV >! 
OV * 

av >i 


1 




av * 
















Z] 







Figure S3: Biochemical adaptation: principal component analysis of the posterior distribution 
model 4 in the case when cooperativity is included. 
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Figure S4: Robust oscillator design: posterior distribution for model 2 to achieve limits cycles via a 
hopf bifurcation. 
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Figure S5: Robust oscillator design: principal component analysis of the posterior distribution 
model 2 to achieve limits cycles via a hopf bifurcation. 
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Figure S6: Two component systems: evolution to the noise reduction behavior. 
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Figure S7: Two component systems: posterior distribution for the unorthodox system to achieve the 
noise reduction behavior. 
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Figure S8: Two component systems: evolution to the signal reproduction behavior. 
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Figure S9: Two component systems: posterior distribution for the orthodox system to achieve the 
signal reproduction behavior. 
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Figure Sll: Stochastic genetic toggle switch: principal component analysis of the posterior distribution 
for model 2. 
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